26 research outputs found
A practical guide and software for analysing pairwise comparison experiments
Most popular strategies to capture subjective judgments from humans involve
the construction of a unidimensional relative measurement scale, representing
order preferences or judgments about a set of objects or conditions. This
information is generally captured by means of direct scoring, either in the
form of a Likert or cardinal scale, or by comparative judgments in pairs or
sets. In this sense, the use of pairwise comparisons is becoming increasingly
popular because of the simplicity of this experimental procedure. However, this
strategy requires non-trivial data analysis to aggregate the comparison ranks
into a quality scale and analyse the results, in order to take full advantage
of the collected data. This paper explains the process of translating pairwise
comparison data into a measurement scale, discusses the benefits and
limitations of such scaling methods and introduces a publicly available
software in Matlab. We improve on existing scaling methods by introducing
outlier analysis, providing methods for computing confidence intervals and
statistical testing and introducing a prior, which reduces estimation error
when the number of observers is low. Most of our examples focus on image
quality assessment.Comment: Code available at https://github.com/mantiuk/pwcm
Distilling Style from Image Pairs for Global Forward and Inverse Tone Mapping
Many image enhancement or editing operations, such as forward and inverse
tone mapping or color grading, do not have a unique solution, but instead a
range of solutions, each representing a different style. Despite this, existing
learning-based methods attempt to learn a unique mapping, disregarding this
style. In this work, we show that information about the style can be distilled
from collections of image pairs and encoded into a 2- or 3-dimensional vector.
This gives us not only an efficient representation but also an interpretable
latent space for editing the image style. We represent the global color mapping
between a pair of images as a custom normalizing flow, conditioned on a
polynomial basis of the pixel color. We show that such a network is more
effective than PCA or VAE at encoding image style in low-dimensional space and
lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement
over the state-of-the-art methods.Comment: Published in European Conference on Visual Media Production (CVMP
'22
Single-frame Regularization for Temporally Stable CNNs
Convolutional neural networks (CNNs) can model complicated non-linear
relations between images. However, they are notoriously sensitive to small
changes in the input. Most CNNs trained to describe image-to-image mappings
generate temporally unstable results when applied to video sequences, leading
to flickering artifacts and other inconsistencies over time. In order to use
CNNs for video material, previous methods have relied on estimating dense
frame-to-frame motion information (optical flow) in the training and/or the
inference phase, or by exploring recurrent learning structures. We take a
different approach to the problem, posing temporal stability as a
regularization of the cost function. The regularization is formulated to
account for different types of motion that can occur between frames, so that
temporally stable CNNs can be trained without the need for video material or
expensive motion estimation. The training can be performed as a fine-tuning
operation, without architectural modifications of the CNN. Our evaluation shows
that the training strategy leads to large improvements in temporal smoothness.
Moreover, for small datasets the regularization can help in boosting the
generalization performance to a much larger extent than what is possible with
na\"ive augmentation strategies
HDR-VDP-3: A multi-metric for predicting image differences, quality and contrast distortions in high dynamic range and regular content
High-Dynamic-Range Visual-Difference-Predictor version 3, or HDR-VDP-3, is a
visual metric that can fulfill several tasks, such as full-reference
image/video quality assessment, prediction of visual differences between a pair
of images, or prediction of contrast distortions. Here we present a high-level
overview of the metric, position it with respect to related work, explain the
main differences compared to version 2.2, and describe how the metric was
adapted for the HDR Video Quality Measurement Grand Challenge 2023
A Model of Local Adaptation
The visual system constantly adapts to different luminance levels when viewing natural scenes. The state of visual adaptation is the key parameter in many visual models. While the time-course of such adaptation is well understood, there is little known about the spatial pooling that drives the adaptation signal. In this work we propose a new empirical model of local adaptation, that predicts how the adaptation signal is integrated in the retina. The model is based on psychophysical measurements on a high dynamic range (HDR) display. We employ a novel approach to model discovery, in which the experimental stimuli are optimized to find the most predictive model. The model can be used to predict the steady state of adaptation, but also conservative estimates of the visibility(detection) thresholds in complex images.We demonstrate the utility of the model in several applications, such as perceptual error bounds for physically based rendering, determining the backlight resolution for HDR displays, measuring the maximum visible dynamic range in natural scenes, simulation of afterimages, and gaze-dependent tone mapping
Recommended from our members
Temporal Resolution Multiplexing: Exploiting the limitations of spatio-temporal vision for more efficient VR rendering.
Rendering in virtual reality (VR) requires substantial computational power to generate 90 frames per second at high resolution with good-quality antialiasing. The video data sent to a VR headset requires high bandwidth, achievable only on dedicated links. In this paper we explain how rendering requirements and transmission bandwidth can be reduced using a conceptually simple technique that integrates well with existing rendering pipelines. Every even-numbered frame is rendered at a lower resolution, and every odd-numbered frame is kept at high resolution but is modified in order to compensate for the previous loss of high spatial frequencies. When the frames are seen at a high frame rate, they are fused and perceived as high-resolution and high-frame-rate animation. The technique relies on the limited ability of the visual system to perceive high spatio-temporal frequencies. Despite its conceptual simplicity, correct execution of the technique requires a number of non-trivial steps: display photometric temporal response must be modeled, flicker and motion artifacts must be avoided, and the generated signal must not exceed the dynamic range of the display. Our experiments, performed on a high-frame-rate LCD monitor and OLED-based VR headsets, explore the parameter space of the proposed technique and demonstrate that its perceived quality is indistinguishable from full-resolution rendering. The technique is an attractive alternative to reprojection and resolution reduction of all frames.European Research Council; European Union Horizon 2020 research and innovation programm
High Dynamic Range Imaging Technology.
Abstract:
In this lecture note, we describe high dynamic range (HDR) imaging systems. Such systems are able to represent luminances of much larger brightness and, typically, a larger range of colors than conventional standard dynamic range (SDR) imaging systems. The larger luminance range greatly improves the overall quality of visual content, making it appear much more realistic and appealing to observers. HDR is one of the key technologies in the future imaging pipeline, which will change the way the digital visual content is represented and manipulated today
Depth from HDR: Depth Induction or Increased Realism?
Many people who first see a high dynamic range (HDR) display get the impression that it is a 3D display, even though it does not produce any binocular depth cues. Possible explanations of this effect include contrast-based depth induction and the increased re-alism due to the high brightness and contrast that makes an HDR display “like looking through a window”. In this paper we test both of these hypotheses by comparing the HDR depth illusion to real binocular depth cues using a carefully calibrated HDR stereo-scope. We confirm that contrast-based depth induction exists, but it is a vanishingly weak depth cue compared to binocular depth cues. We also demonstrate that for some observers, the increased con-trast of HDR displays indeed increases the realism. However, it is highly observer-dependent whether reduced, physically correct, or exaggerated contrast is perceived as most realistic, even in the pres-ence of the real-world reference scene. Similarly, observers differ in whether reduced, physically correct, or exaggerated stereo 3D is perceived as more realistic. To accommodate the binocular depth perception and realism concept of most observers, display technolo-gies must offer both HDR contrast and stereo personalization